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Fluviicola taffensis O'Sullivan ef al. 2005 belongs to the monotypic genus Fluviicola within 
the family Cryomorphaceae. The species is of interest because of its isolated phylogenetic lo- 
cation in the genome-sequenced fraction of the tree of life. Strain RW262 T forms a monophy- 
letic lineage with uncultivated bacteria represented in freshwater 16S rRNA gene libraries. A 
similar phylogenetic differentiation occurs between freshwater and marine bacteria in the 
family Flavobacteriaceae, a sister family to Cryomorphaceae. Most remarkable is the inability 
of this freshwater bacterium to grow in the presence of Na + ions. All other genera in the fami- 
ly Cryomorphaceae are from marine habitats and have an absolute requirement for Na + ions 
or natural sea water. F. taffensis is the first member of the family Cryomorphaceae with a 
completely sequenced and publicly available genome. The 4,633,577 bp long genome with 
its 4,082 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria 
and Archaea project. 



Introduction 

Strain RW262 T (= DSM 16823 = NCIMB 13979) is 
the type strain of the species Fluviicola taffensis, 
which is the type species of the monotypic genus 
Fluviicola [1], affiliated with the family Cryomor- 
phaceae [2]. The genus name is derived from the 
Latin words fluvius, meaning 'river' and -cola 
meaning 'inhabitant, dweller', yielding the Neo- 
Latin word Fluviicola, the river dweller [1,3]. The 
species epithet is derived from the Neo-Latin 



word taffensis, referring to the place where the 
type strain has been isolated, the river Taff 
(Wales, UK) [1,3]. The family Cryomorphaceae be- 
longs to the class Flavobacteria which contains 
many species that probably play an integral role 
for the flow of carbon and energy in the marine 
environment [4]. 

Flavobacteria are the major decomposers of high- 
molecular-mass organic matter in sea water [5]. 
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Phylogenetically the family Cryomorphaceae is 
located between the families Flavobactehaceae 
and Bacteroidaceae [2] and currently comprises 
the genera Brumimicrobium, Cryomorpha and Cro- 
cinitomix [2], Owenweeksia [6], Wandonia [7], F/u- 
viicola [1] and Lishizhenia [8]. The family Cryo- 
morphaceae exhibits the greatest degree of pheno- 
typic similarity to the family Flavobactehaceae [9] 
and includes species with a mostly rod-like to fi- 
lamentous morphology; cells are usually non- 
motile or move by gliding and often contain caro- 
tenoid pigments [1,2]. All members of the Cryo- 
morphaceae are strictly aerobic or facultatively 
anaerobic (fermentative) with a chemohetero- 
trophic metabolism [1,2] and often have complex 
growth requirements for sea water salts, organic 
compounds as sole nitrogen sources, yeast extract 
and vitamins for growth [2]. To date no further 
isolates of F. taffensis have been reported. Here we 
present a summary classification and a set of fea- 
tures for F. taffensis RW262 T , together with the 
description of the complete genomic sequencing 
and annotation. 

Classification and features 

A representative genomic 16S rRNA sequence of F. 
taffensis RW262 T was compared using NCBI BLAST 
[10] under default settings (e.g., considering only 
the high-scoring segment pairs (HSPs) from the 
best 250 hits] with the most recent release of the 
Greengenes database [11] and the relative frequen- 
cies of taxa and keywords (reduced to their stem 
[12]) were determined, weighted by BLAST scores. 
The most frequently occurring genera were Bru- 
mimicrobium (62.9%) and Fluviicola (37.1%) (3 
hits in total). Among all other species, the one yield- 
ing the highest score was 'Brumimicrobium meso- 
philum' (DQ660382), which corresponded to an 
identity of 92.1% and an HSP coverage of 58.0%. 
(Note that the Greengenes database uses the INSDC 
(= EMBL/NCBI/DDBJ) annotation, which is not an 
authoritative source for nomenclature or classifica- 
tion.) The most frequently occurring keywords 
within the labels of all environmental samples 
which yielded hits were 'lake' (9.1%), 'tin' (3.4%), 
'microbi' (2.5%), 'depth' (2.0%) and 'tract' (1.7%) 
(247 hits in total). The most frequently occurring 
keywords within those labels of environmental 
samples which yielded hits of a higher score than 
the highest scoring species were 'lake' (9.2%), 'tin' 
(3.8%), 'microbi' (2.3%), 'depth' (2.0%) and 'tract' 
(1.8%) (169 hits in total). The most frequent key- 



word 'lake' may reflect the freshwater origin of 
strain RW262 T , whereas the keywords 'tin' and 
'depth' may allude to some until now unrecognized 
ecological features of F. taffensis. 

Figure 1 shows the phylogenetic neighborhood of F. 
taffensis in a 16S rRNA based tree. The sequences of 
the two identical 16S rRNA gene copies in the ge- 
nome differ by two nucleotides from the previously 
published 16S rRNA sequence (AF493694), which 
contains one ambiguous base call. 

Strain RW262 T is strictly aerobic, Gram-negative, 
motile by gliding and flexirubin-pigmented [1]. 
Cells are flexible rods with rounded ends (Figure 
2), 0.4-0.5 \im in diameter and 1.5-5.7 \im in 
length, with rare longer filaments of up to 51 \im 
in length [1]. Growth occurs at 4 Q C and 20 Q C, but 
not in the presence of Na + ions [1]. Growth of 
strain RW262 T at 4 Q C is only weak, so that F. taf- 
fensis should not be considered to be psychrotole- 
rant like the other members of the family [1,2]. 
Strain RW262 T is capable of DNA hydrolysis [1], is 
catalase positive but oxidase negative, able to cat- 
alyze the hydrolysis of arginine, aesculin or starch, 
whereas it weakly hydrolyzes gelatine [1]. It is 
negative for nitrate and nitrite reduction; indole 
production; B-galactosidase, urease and xylanase 
activity; hydrolysis of agar, arginine, aesculin and 
starch; and acid production from carbohydrates 
[1]. The strain is not able to utilize glucose, arabi- 
nose, mannose, mannitol, iV-acetylglucosamine, 
maltose, gluconate, caprate, adipate, malate, ci- 
trate or phenyl acetate [1]. However, within the 
genome are several genes for utilization of com- 
plex organic carbon compounds. The strain is re- 
sistant to chloramphenicol (10 \ig), streptomycin 
(10 \ig), and kanamycin (30 \ig) but susceptible to 
penicillin G (10 units), ampicillin (10 ug), rifampi- 
cin (5 [ig) and tetracycline (10 \ig) [1]. 

Chemotaxonomy 

The predominant cellular acid of strain RW262 T 
was the branched-chain saturated fatty acid iso- 
Ci5 : o (44.2%) [1]. Unsaturated branched-chain fat- 
ty acids, straight-chain saturated and mono- 
unsaturated fatty acids occur only in lower 
amounts: Ci 4:0 (3.2%), Ci 5:0 (7.5%), Ci 6:0 (3.0%), 

/SO-Cl5:l colOc (11.8%), /SO-Cl6:l col2c (4.9%). LipOpO- 

lysaccharide hydroxy fatty acids constitute 20.4% 
of the total cellular fatty acids, mainly composed 
of iso-Cn-o 3-oh (12.3%), iso-Ci 5: o 3-oh (4.2%) and 

iSO-Cl5:0 2-OH (3.5%) [1]. 
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- Lishizhenia tianjinensis (EU183317) 



- Lishizhenia caseinilytica (AB1 76674) 



- Brumimicrobium glaciale (AF521 195) 



- Fluviicola taffensis (IMG2503736905) " 



Wandonia haliotis (FJ424814) 



- Crocinitomix catalasitica (AB078042) 



- Owenweeksia hongkongensis (AB125062) 



Cryomorpha ignava (AF1 70738) 



0.02 



Figure 1. Phylogenetic tree highlighting the position of F. taffensis relative to the type strains of the other spe- 
cies within the family Cryomorphaceae. The tree was inferred from 1,429 aligned characters [13,14] of the 
16S rRNA gene sequence under the maximum likelihood (ML) criterion [15]. Rooting was done initially using 
the midpoint method [16] and then checked for its agreement with the current classification (Table 1). The 
branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the 
branches are support values from 300 ML bootstrap replicates [17] (left) and from 1,000 maximum parsimony 
bootstrap replicates [18] (right) if larger than 60%. Lineages with type strain genome sequencing projects regis- 
tered in GOLD [19] are labeled with one asterisk, those also listed as 'Complete and Published' (as well as the 
target genome) with two asterisks. 
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Table 1. Classification and general features of F. taffensis RW262 T according to the MIGS recommendations [20] 
and the NamesforLife database [21]. 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [22] 






Phylum Bacteroidetes 


TAS [23] 






Class " Flavobacteria" 


TAS [24] 




Current classification 


Order "Flavobacteriales" 


TAS [25] 






Family Cryomorphaceae 


TAS [2] 






Genus Fluviicola 


TAS [1] 






Species Flaviicola taffensis 


TAS [1] 






Type strain RW262 


TAS [1] 




Gram stain 


negative 


TAS [1] 




Cell shape 


rod-shaped 


TAS [1] 




Motility 


by gliding 


TAS [1] 




Sporulation 


none 


TAS [1] 




Temperature range 


4°C-25°C 


TAS [1] 




Optimum temperature 


20°C 


TAS [1] 




Salinity 


obligate 0% 


TAS [1] 


MIGS-22 


Oxygen requirement 


strict aerobe 


TAS [1] 




Carbon source 


probably amino acids; unable to use carbohydrates 


NAS 




Energy metabolism 


chemoorganotroph 


TAS [1] 


MIGS-6 


Habitat 


fresh water 


TAS [1] 


MIGS-15 


Biotic relationship 


free-living 


NAS 


MIGS-14 


Pathogenicity 


none 


NAS 




Biosafety level 


1 


TAS [26] 




Isolation 


fresh river water 


TAS [1] 


MIGS-4 


Geographic location 


River Taff near Cardiff, UK 


TAS [1] 


MIGS-5 


Sample collection time 


January 2000 


TAS [1] 


MIGS-4.1 


Latitude 


51.85 


TAS [1] 


MIGS-4.2 


Longitude 


-2.32 


TAS [1] 


MIGS-4.3 


Depth 


not reported 




MIGS-4.4 


Altitude 


sea level 


NAS 



Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable 
Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted 
property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project 
[27]. 



Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its phylogenetic position [28], and is part 
of the Genomic Encyclopedia of Bacteria and Arc- 
haea project [29]. The genome project is depo- 
sited in the Genome On Line Database [19] and the 



complete genome sequence is deposited in Gen- 
Bank. Sequencing, finishing and annotation were 
performed by the DOE Joint Genome Institute 
QGI). A summary of the project information is 
shown in Table 2. 
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Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


MIGS-31 

J V 1 1 V_J *J ±J i 


Finishing n 1 1 a I i tv 

1 J 1 1 J J> 1 1 J 1 1 <_L \-A U CI 1 1 L y 


Finished 




Libraries used 


Tree genomic libraries: one 454 pyrosequence standard library, 


MIGS-28 


one 454 PE library (1 1 kb insert size), one lllumina library 


MIGS-29 


Sequencing platforms 


lllumina GAii, 454 GS FLX Titanium 


MIGS-31. 2 


Sequencing coverage 


351.0 x lllumina; 23.0 x pyrosequence 


MIGS-30 


Assemblers 


Newbler version 2.3, Velvet, phrap version SPS - 4.24 


MIGS-32 


Gene calling method 


Prodigal 1.4, GenePRIMP 




INSDC ID 


CP002542 




Genbank Date of Release 


April 1, 2011 




GOLD ID 


Gc01706 




NCBI project ID 


47603 




Database: IMG-GEBA 


2503707007 


MIGS-13 


Source material identifier 


DSM 16823 




Project relevance 


Tree of Life, GEBA 



Growth conditions and DNA isolation 

F. taffensis RW262 T , DSM 16823, was grown in 
DSMZ medium 948 (Oxoid nutrient medium) [30] 
at 28°C. DNA was isolated from 0.5-1 g of cell 
paste using JetFlex Genomic DNA Purification kit 
(GENOMED 600100) following the standard pro- 
tocol as recommended by the manufacturer, but 
with additional 20 ul proteinase K incubation (one 
hour) at 58° for improved cell lysis. DNA is availa- 
ble through the DNA Bank Network [31]. 

Genome sequencing and assembly 

The genome was sequenced using a combination 
of lllumina and 454 sequencing platforms. All 
general aspects of library construction and se- 
quencing can be found at the JGI website [32]. Py- 
rosequencing reads were assembled using the 
Newbler assembler (Roche). The initial Newbler 
assembly consisting of 51 contigs in one scaffold 
was converted into a phrap [33] assembly by mak- 
ing fake reads from the consensus, to collect the 
read pairs in the 454 paired end library. lllumina 
GAii sequencing data (801.4 Mb) was assembled 
with Velvet [34] and the consensus sequences 
were shredded into 1.5 kb overlapped fake reads 
and assembled together with the 454 data. The 
454 draft assembly was based on 164.9 Mb 454 
draft data and all of the 454 paired end data. 
Newbler parameters are -consed -a 50 -1 350 -g -m 
-ml 20. The Phred/Phrap/Consed software pack- 
age [33] was used for sequence assembly and 
quality assessment in the subsequent finishing 
process. After the shotgun stage, reads were as- 
sembled with parallel phrap (High Performance 



Software, LLC). Possible mis-assemblies were cor- 
rected with gapResolution [32], Dupfinisher [35], 
or sequencing clones bridging PCR fragments with 
subcloning. Gaps between contigs were closed by 
editing in Consed, by PCR and by Bubble PCR pri- 
mer walks (J.-F. Chang, unpublished). A total of 
161 additional reactions and shatter libraries 
were necessary to close gaps and to raise the qual- 
ity of the finished sequence. lllumina reads were 
also used to correct potential base errors and in- 
crease consensus quality using a software Polisher 
developed at JGI [36]. The error rate of the com- 
pleted genome sequence was less than 1 in 
100,000. Together, the combination of the lllumi- 
na and 454 sequencing platforms provided 374.0 
x coverage of the genome. The final assembly con- 
tained 232,904 pyrosequence and 44,902,395 ll- 
lumina reads. 

Genome annotation 

Genes were identified using Prodigal [37] as part 
of the Oak Ridge National Laboratory genome an- 
notation pipeline, followed by a round of manual 
curation using the JGI GenePRIMP pipeline [38]. 
The predicted CDSs were translated and used to 
search the National Center for Biotechnology In- 
formation (NCBI) non-redundant database, Uni- 
Prot, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and In- 
terPro databases. Additional gene prediction anal- 
ysis and functional annotation was performed 
within the Integrated Microbial Genomes - Expert 
Review (IMG-ER) platform [39]. 
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Genome properties 

The genome consists of a 4,633,577 bp long chro- 
mosome with a G+C content of 36.5% (Figure 3 
and Table 3). Of the 4,131 genes predicted, 4,082 
were protein-coding genes, and 49 RNAs; 49 



pseudogenes were also identified. The majority of 
the protein-coding genes (55.0%) were assigned a 
putative function while the remaining ones were 
annotated as hypothetical proteins. The distribu- 
tion of genes into COGs functional categories is 
presented in Table 4. 
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Figure 3. Graphical circular map of the genome. From outside to the center: Genes on forward strand (color 
by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs 
red, other RNAs black), GC content, GC skew. 
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Table 3. Genome Statistics 



Attribute 


Value 


% of Total 


Genome size (bp) 


4,633,577 


100.00% 


DNA coding region (bp) 


4,192,830 


90.49% 


DNA G+C content (bp) 


1,691,009 


36.49% 


Number of replicons 


1 




Extrachromosomal elements 


0 




Total genes 


4,131 


100.00% 


RNA genes 


49 


1.19% 


rRNA operons 


2 




Protein-coding genes 


4,082 


98.81 % 


Pseudo genes 


49 


1.19% 


Genes with function prediction 


2,271 


54.97% 


Genes in paralog clusters 


532 


12.88% 


Genes assigned to COGs 


2,169 


52.51% 


Genes assigned Pfam domains 


2,420 


58.58% 


Genes with signal peptides 


1,331 


32.22% 


Genes with transmembrane helices 


911 


22.05% 


CRISPR repeats 


1 





Table 4. Number of genes associated with the general COG functional categories 



Code value %age Description 



r 
J 


I DO 


O.U 


Translation, ribosomal structure and biogenesis 


A 


0 


0.0 


RNA processing and modification 


K 


212 


8.8 


Transcription 


L 


137 


5.7 


Replication, recombination and repair 


B 


I 


0.0 


Chromatin structure and dynamics 


D 


22 


0.9 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0.0 


Nuclear structure 


V 


57 


2.4 


Defense mechanisms 


T 


1 83 


7.6 


Signal transduction mechanisms 


M 


222 


9.2 


Cell wall/membrane/envelope biogenesis 


N 


8 


0.3 


Cell motility 


Z 


0 


0.0 


Cytoskeleton 


w 


0 


0.0 


Extracellular structures 


u 


42 


1.8 


Intracellular trafficking, secretion, and vesicular transport 


o 


1 06 


4.4 


Posttranslational modification, protein turnover, chaperones 


c 


117 


4.9 


Energy production and conversion 


G 


76 


3.2 


Carbohydrate transport and metabolism 


E 


1 36 


5.7 


Amino acid transport and metabolism 


F 


63 


2.6 


Nucleotide transport and metabolism 


H 


1 14 


4.7 


Coenzyme transport and metabolism 


1 


101 


4.2 


Lipid transport and metabolism 


P 


116 


4.8 


Inorganic ion transport and metabolism 


Q 


45 


1.9 


Secondary metabolites biosynthesis, transport and catabolism 


R 


286 


11.9 


General function prediction only 


S 


194 


8.1 


Function unknown 




1,962 


47.5 


Not in COGs 
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